Search CORE

Edinburgh Research Explorer

Correcting the Bias of Empirical Frequency Parameter Estimators in Codon Models

Author: C Kosiol
C Seoighe
G Schwarz
GC Conant
Konrad Scheffler
M Anisimova
M Lacerda
N Goldman
S Whelan
Sergei Kosakovsky Pond
SL Kosakovsky Pond
SL Kosakovsky Pond
SL Kosakovsky Pond
Spencer V. Muse
SV Muse
Thomas Mailund
W Delport
Wayne Delport
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

Markov models of codon substitution are powerful inferential tools for studying biological processes such as natural selection and preferences in amino acid substitution. The equilibrium character distributions of these models are almost always estimated using nucleotide frequencies observed in a sequence alignment, primarily as a matter of historical convention. In this note, we demonstrate that a popular class of such estimators are biased, and that this bias has an adverse effect on goodness of fit and estimates of substitution rates. We propose a “corrected” empirical estimator that begins with observed nucleotide counts, but accounts for the nucleotide composition of stop codons. We show via simulation that the corrected estimates outperform the de facto standard estimates not just by providing better estimates of the frequencies themselves, but also by leading to improved estimation of other parameters in the evolutionary models. On a curated collection of sequence alignments, our estimators show a significant improvement in goodness of fit compared to the approach. Maximum likelihood estimation of the frequency parameters appears to be warranted in many cases, albeit at a greater computational cost. Our results demonstrate that there is little justification, either statistical or computational, for continued use of the -style estimators

CiteSeerX

Stellenbosch University SUNScholar Repository

Selective Constraints on Amino Acids Estimated by a Mechanistic Codon Substitution Model with Multiple Nucleotide Changes

Author: A Doron-Faigenboim
A Schneider
AL Halpern
AR Kinjo
C Kosiol
Darren Martin
DT Jones
G Bazykin
GC Conant
H Akaike
I Keller
J Adachi
J Adachi
JP Huelsenbeck
K Tamura
L Jin
M Anisimova
M Averof
M Hasegawa
M Kimura
MA Larkin
MO Dayhoff
MW Dimmic
N Goldman
N Rodrigue
N Takahata
NGC Smith
R Grantham
S Guindon
S Miyazawa
S Whelan
S Whelan
S Whelan
Sanzo Miyazawa
SC Choi
SQ Le
SV Muse
T Miyata
T Miyata
TK Seo
TK Seo
W Delport
W Delport
Z Yang
Z Yang
Z Yang
Z Yang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 18/03/2011
Field of study

Empirical substitution matrices represent the average tendencies of substitutions over various protein families by sacrificing gene-level resolution. We develop a codon-based model, in which mutational tendencies of codon, a genetic code, and the strength of selective constraints against amino acid replacements can be tailored to a given gene. First, selective constraints averaged over proteins are estimated by maximizing the likelihood of each 1-PAM matrix of empirical amino acid (JTT, WAG, and LG) and codon (KHG) substitution matrices. Then, selective constraints specific to given proteins are approximated as a linear function of those estimated from the empirical substitution matrices. Akaike information criterion (AIC) values indicate that a model allowing multiple nucleotide changes fits the empirical substitution matrices significantly better. Also, the ML estimates of transition-transversion bias obtained from these empirical matrices are not so large as previously estimated. The selective constraints are characteristic of proteins rather than species. However, their relative strengths among amino acid pairs can be approximated not to depend very much on protein families but amino acid pairs, because the present model, in which selective constraints are approximated to be a linear function of those estimated from the JTT/WAG/LG/KHG matrices, can provide a good fit to other empirical substitution matrices including cpREV for chloroplast proteins and mtREV for vertebrate mitochondrial proteins. The present codon-based model with the ML estimates of selective constraints and with adjustable mutation rates of nucleotide would be useful as a simple substitution model in ML and Bayesian inferences of molecular phylogenetic trees, and enables us to obtain biologically meaningful information at both nucleotide and amino acid levels from codon and protein sequences.Comment: Table 9 in this article includes corrections for errata in the Table 9 published in 10.1371/journal.pone.0017244. Supporting information is attached at the end of the article, and a computer-readable dataset of the ML estimates of selective constraints is available from 10.1371/journal.pone.001724

arXiv.org e-Print Archive

Episodic Evolution and Adaptation of Chloroplast Genomes in Ancestral Grasses

Author: A Vicentini
AD Yoder
AJ Drummond
B Rannala
BF Jacobs
Bojian Zhong
BS Gaut
DR Piperno
H Akaike
H Kishino
HP Linder
J Bousquet
J Leebens-Mack
J Zhang
JL Thorne
JP Huelsenbeck
JW Brown
K Bremer
KH Wolfe
M Hasegawa
M Nikaido
Masami Hasegawa
MJ Moore
MJ Sanderson
MJ Sanderson
MM Guisinger
N Goldman
P Erixon
PS Herendeen
RK Jansen
RM Maier
S Aris-Brosou
S-M Chaw
SA Smith
Simon Joly
SS Renner
SV Muse
SV Muse
T Lepage
Takahiro Yonezawa
V Prasad
W Martin
Y Kitazoe
Y Matsuoka
Yang Zhong
Z Yang
Z Yang
Z Yang
Z Yang
Z Yang
Publication venue: Public Library of Science
Publication date: 24/04/2009
Field of study

It has been suggested that the chloroplast genomes of the grass family, Poaceae, have undergone an elevated evolutionary rate compared to most other angiosperms, yet the details of this phenomenon have remained obscure. To know how the rate change occurred during evolution, estimation of the time-scale with reliable calibrations is needed. The recent finding of 65 Ma grass phytoliths in Cretaceous dinosaur coprolites places the diversification of the grasses to the Cretaceous period, and provides a reliable calibration in studying the tempo and mode of grass chloroplast evolution.By using chloroplast genome data from angiosperms and by taking account of new paleontological evidence, we now show that episodic rate acceleration both in terms of non-synonymous and synonymous substitutions occurred in the common ancestral branch of the core Poaceae (a group formed by rice, wheat, maize, and their allies) accompanied by adaptive evolution in several chloroplast proteins, while the rate reverted to the slow rate typical of most monocot species in the terminal branches.Our finding of episodic rate acceleration in the ancestral grasses accompanied by adaptive molecular evolution has a profound bearing on the evolution of grasses, which form a highly successful group of plants. The widely used model for estimating divergence times was based on the assumption of correlated rates between ancestral and descendant lineages. However, the assumption is proved to be inadequate in approximating the episodic rate acceleration in the ancestral grasses, and the assumption of independent rates is more appropriate. This finding has implications for studies of molecular evolutionary rates and time-scale of evolution in other groups of organisms

Identification of physicochemical selective pressure on protein encoding nucleotide sequences

Author: A Shaw
B Clarke
CJ Epstein
L Bernatchez
N Goldman
N Goldman
R Grantham
R Nielsen
R Sainudiin
Raazesh Sainudiin
Rasmus Nielsen
S Henikoff
SV Muse
W Yang
Wendy SW Wong
WHTSAVWTFBP Press
WJ Swanson
YH Lee
Z Yang
Z Yang
Z Yang
Z Yang
Z Yang
Z Yang
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Statistical methods for identifying positively selected sites in protein coding regions are one of the most commonly used tools in evolutionary bioinformatics. However, they have been limited by not taking the physiochemical properties of amino acids into account. RESULTS: We develop a new codon-based likelihood model for detecting site-specific selection pressures acting on specific physicochemical properties. Nonsynonymous substitutions are divided into substitutions that differ with respect to the physicochemical properties of interest, and those that do not. The substitution rates of these two types of changes, relative to the synonymous substitution rate, are then described by two parameters, γ and ω respectively. The new model allows us to perform likelihood ratio tests for positive selection acting on specific physicochemical properties of interest. The new method is first used to analyze simulated data and is shown to have good power and accuracy in detecting physicochemical selective pressure. We then re-analyze data from the class-I alleles of the human Major Histocompatibility Complex (MHC) and from the abalone sperm lysine. CONCLUSION: Our new method allows a more flexible framework to identify selection pressure on particular physicochemical properties

Springer - Publisher Connector

Copenhagen University Research Information System

eScholarship - University of California

Ancient DNA Elucidates the Controversy about the Flightless Island Hens (Gallinula sp.) of Tristan da Cunha

Author: AG Knox
AJ Beintema
Albert J. Beintema
B Slikas
B Taylor
D Bensasson
D Casane
D Posada
Dick S. J. Groenenberg
DL Swofford
EC Dickinson
Edmund Gittenberger
F Ronquist
G Eber
GJ Broekhuysen
J Binladen
J del Hoyo
J Milner
JA Allen
JAA Nylander
Jean-Nicolas Volff
JJ Austin
L Orlando
MD Sorenson
MDB Eldridge
ME Richardson
MJ Nicoll
MJ Nicoll
MTP Gilbert
NT Perna
PL Sclater
R Frankham
R Frankham
René W. R. J. Dekker
RM Sperling
RV Collura
SLK Pond
SO Kolokotronis
SV Muse
TD Kocher
TW Quinn
WP Maddison
Publication venue: Public Library of Science
Publication date: 01/01/2007
Field of study

A persistent controversy surrounds the flightless island hen of Tristan da Cunha, Gallinula nesiotis. Some believe that it became extinct by the end of the 19th century. Others suppose that it still inhabits Tristan. There is no consensus about Gallinula comeri, the name introduced for the flightless moorhen from the nearby island of Gough. On the basis of DNA sequencing of both recently collected and historical material, we conclude that G. nesiotis and G. comeri are different taxa, that G. nesiotis indeed became extinct, and that G. comeri now inhabits both islands. This study confirms that among gallinules seemingly radical adaptations (such as the loss of flight) can readily evolve in parallel on different islands, while conspicuous changes in other morphological characters fail to occur

CiteSeerX

Leiden University Scholary Publications

Implications of the Plastid Genome Sequence of Typha (Typhaceae, Poales) for Understanding Genome Evolution in Poaceae

Author: A Prombona
APG II
B Ewing
B Ewing
BR Morton
BR Morton
BS Gaut
C Saski
C Saski
CC Chang
CJ Howe
CJ Howe
CM Bowman
D Gordon
D Swofford
D Verma
DE Soltis
DF Garvin
E Belda
E Bortiri
EA Kellogg
EB Knox
EM Rubin
F Quigley
F Wu
FR Khazi
GM Plunkett
GPWG
H Cerutti
H Katayama
H Shimada
J Aii
J Bousquet
J Cao
J Hiratsuka
J Leebens-Mack
J Yu
JD Palmer
JD Palmer
JD Palmer
Jeffrey L. Boore
Jennifer V. Kuehl
JJ Doyle
KH Wolfe
L Elnitski
LA Raubeson
LA Raubeson
LG Clark
Mary M. Guisinger
MD Logacheva
MD Logacheva
ME Cosner
MJ Moore
MM Guisinger
MW Chase
NC Carpita
NP Barker
O Khakhlova
R Bock
R Shao
RC Haberle
RJ Wang
RK Jansen
RK Jansen
RM Maier
Robert K. Jansen
S Holm
S Schwartz
S Stefanovic
SA Smith
SE Goulding
SK Wyman
SR Downie
SV Muse
SV Muse
SW Graham
T Asano
TA Heath
TG Barraclough
Timothy W. Chumley
TW Chumley
W Xu
Y Benjamini
Y Matsuoka
Y Ogihara
Z Lin
Z Yang
Z Yang
Z Yang
Z Yang
Publication venue: Springer-Verlag
Publication date: 01/01/2010
Field of study

Plastid genomes of the grasses (Poaceae) are unusual in their organization and rates of sequence evolution. There has been a recent surge in the availability of grass plastid genome sequences, but a comprehensive comparative analysis of genome evolution has not been performed that includes any related families in the Poales. We report on the plastid genome of Typha latifolia, the first non-grass Poales sequenced to date, and we present comparisons of genome organization and sequence evolution within Poales. Our results confirm that grass plastid genomes exhibit acceleration in both genomic rearrangements and nucleotide substitutions. Poaceae have multiple structural rearrangements, including three inversions, three genes losses (accD, ycf1, ycf2), intron losses in two genes (clpP, rpoC1), and expansion of the inverted repeat (IR) into both large and small single-copy regions. These rearrangements are restricted to the Poaceae, and IR expansion into the small single-copy region correlates with the phylogeny of the family. Comparisons of 73 protein-coding genes for 47 angiosperms including nine Poaceae genera confirm that the branch leading to Poaceae has significantly accelerated rates of change relative to other monocots and angiosperms. Furthermore, rates of sequence evolution within grasses are lower, indicating a deceleration during diversification of the family. Overall there is a strong correlation between accelerated rates of genomic rearrangements and nucleotide substitutions in Poaceae, a phenomenon that has been noted recently throughout angiosperms. The cause of the correlation is unknown, but faulty DNA repair has been suggested in other systems including bacterial and animal mitochondrial genomes

Springer - Publisher Connector

ScholarWorks at Central Washington University

Whole-Gene Positive Selection, Elevated Synonymous Substitution Rates, Duplication, and Indel Evolution of the Chloroplast clpP1 Gene

Author: A Rambaut
AK Clarke
B Oxelman
B Oxelman
B Oxelman
Bengt Oxelman
C MacCallum
C Roth
D Higgins
DA Benson
H Kuroda
II APG
J Shaw
J-C deCambiaire
Jean-Nicolas Volff
JN Timmis
JP Huelsenbeck
JZ Zhang
KH Wolfe
LA Raubeson
M Popp
ME Johnson
MF Wojciechowski
MT Clegg
MV Kapralov
MV Kapralov
P Erixon
Per Erixon
RA Levin
RG Olmstead
S Guindon
SV Muse
T Endo
T Ohta
T Shikanai
VV Goremykin
Y Cho
Y Xing
Z Adam
Z Zhang
ZH Yang
ZH Yang
ZH Yang
Publication venue: Public Library of Science
Publication date: 01/01/2008
Field of study

Synonymous DNA substitution rates in the plant chloroplast genome are generally relatively slow and lineage dependent. Non-synonymous rates are usually even slower due to purifying selection acting on the genes. Positive selection is expected to speed up non-synonymous substitution rates, whereas synonymous rates are expected to be unaffected. Until recently, positive selection has seldom been observed in chloroplast genes, and large-scale structural rearrangements leading to gene duplications are hitherto supposed to be rare. genes experiencing negative (purifying) selection are characterized by having very conserved lengths, genes under positive selection often have large insertions of more or less repetitive amino acid sequence motifs. gene and surrounding regions, repetitive amino acid sequences, and increase in synonymous substitution rates. The present study sheds light on the controversial issue of whether negative or positive selection is to be expected after gene duplications by providing evidence for the latter alternative. The observed increase in synonymous substitution rates in some of the lineages indicates that the detection of positive selection may be obscured under such circumstances. Future studies are required to explore the functional significance of the large inserted repeated amino acid motifs, as well as the possibility that synonymous substitution rates may be affected by positive selection

CodonTest: Modeling Amino Acid Substitution Preferences in Coding Sequences

Codon models of evolution have facilitated the interpretation of selective forces operating on genomes. These models, however, assume a single rate of non-synonymous substitution irrespective of the nature of amino acids being exchanged. Recent developments have shown that models which allow for amino acid pairs to have independent rates of substitution offer improved fit over single rate models. However, these approaches have been limited by the necessity for large alignments in their estimation. An alternative approach is to assume that substitution rates between amino acid pairs can be subdivided into rate classes, dependent on the information content of the alignment. However, given the combinatorially large number of such models, an efficient model search strategy is needed. Here we develop a Genetic Algorithm (GA) method for the estimation of such models. A GA is used to assign amino acid substitution pairs to a series of rate classes, where is estimated from the alignment. Other parameters of the phylogenetic Markov model, including substitution rates, character frequencies and branch lengths are estimated using standard maximum likelihood optimization procedures. We apply the GA to empirical alignments and show improved model fit over existing models of codon evolution. Our results suggest that current models are poor approximations of protein evolution and thus gene and organism specific multi-rate models that incorporate amino acid substitution biases are preferred. We further anticipate that the clustering of amino acid substitution rates into classes will be biologically informative, such that genes with similar functions exhibit similar clustering, and hence this clustering will be useful for the evolutionary fingerprinting of genes

Cronfa at Swansea University

Stellenbosch University SUNScholar Repository

Viral Evolution and Cytotoxic T Cell Restricted Selection in Acute Infant HIV-1 Infection

Author: A Carvajal-Rodríguez
A Piantadosi
AA Palm
AJ McMichael
AJ Melvin
BA Richardson
BL Lohman
CF Thobakgale
CM Noviello
D Mbori-Ngacha
DD Panteleeff
E Adland
EM Obimbo
GC John-Stewart
GK Hightower
GM Jenkins
H Zhang
J Esbjörnsson
J Raghwani
J Warszawski
JM Carlson
JW Mellors
K Luzuriaga
KS Lole
M Anisimova
M Bunce
M Kearse
M Mild
MA Larkin
MKP Liu
N Casartelli
N Goonetilleke
N Kiwanuka
N Shaffer
P Borrow
P Lemey
PJR Goulder
PJR Goulder
R Shankarappa
R Spira
RB Markham
RC Edgar
S Alizon
S Emery
S Ganeshan
S Kumar
SLK Pond
SV Muse
T Pillay
V Sanchez-Merino
X Wang
X Wang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Antiretroviral therapy-naive HIV-1 infected infants experience poor viral containment and rapid disease progression compared to adults. Viral factors (e.g. transmitted cytotoxic T- lymphocyte (CTL) escape mutations) or infant factors (e.g. reduced CTL functional capacity) may explain this observation. We assessed CTL functionality by analysing selection in CTL-targeted HIV-1 epitopes following perinatal infection. HIV-1 gag, pol and nef sequences were generated from a historical repository of longitudinal specimens from 19 vertically infected infants. Evolutionary rate and selection were estimated for each gene and in CTL-restricted and non-restricted epitopes. Evolutionary rate was higher in nef and gag vs. pol, and lower in infants with non-severe immunosuppression vs. severe immunosuppression across gag and nef. Selection pressure was stronger in infants with non-severe immunosuppression vs. severe immunosuppression across gag. The analysis also showed that infants with non-severe immunosuppression had stronger selection in CTL-restricted vs. non-restricted epitopes in gag and nef. Evidence of stronger CTL selection was absent in infants with severe immunosuppression. These data indicate that infant CTLs can exert selection pressure on gag and nef epitopes in early infection and that stronger selection across CTL epitopes is associated with favourable clinical outcomes. These results have implications for the development of paediatric HIV-1 vaccines